AITopics | data-driven regularization

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsDec-25-2025, 05:51:26 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.

data-driven regularization, name change, stochastic shared embedding, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

End-to-end reconstruction meets data-driven regularization for inverse problems

Neural Information Processing SystemsAug-16-2025, 23:21:26 GMT

Our work derives ideas from learned optimization and adversarial machine learning, and makes an attempt to combine the best features of both aforementioned paradigms.

artificial intelligence, machine learning, reconstruction, (19 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsMay-31-2025, 13:33:12 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information.

artificial intelligence, machine learning, stochastic shared embedding, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Reviews: Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsJan-22-2025, 21:58:26 GMT

The paper presents a novel and interesting regularization method, theoretical analysis and good results, yet I fear its main contributions might be limited to recommendation systems or other fields where knowledge graphs are available, easily constructed, or in their absence, intuitively reasonable to assume a complete graph. Outside those types of tasks, I find it presenting arguments which intuitively were not too compelling, as to why other fields or tasks would significantly benefit from such a method, despite showing improved results on some NLP tasks. The simpler version of the regularizer, which in the absence of a knowledge graph assumes a complete graph, permutes embedding indices with a constant*U(1,N) probability. Despite its appealing theoretical properties, it also poses a risk of introducing a bias of its own. The results on NLP tasks didn't show major improvements and lacked in explanation as to why this type of regularizer would be beneficial and effective for different NLP tasks.

data-driven regularization, knowledge graph, stochastic shared embedding, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

Reviews: Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsJan-22-2025, 21:58:15 GMT

The paper proposes to integrate a stochastic relabelling embedding operator within the training of a neural net. The reviewers and the area chair are convinced of the merits of the approach which comes with a theoretical justification (smoothing the Rademacher complexity in the uniform case) and solid comparative empirical evidence. The visualization of the embeddings and their interpretation (in supplementary material and in the rebuttal) are appreciated. The AC hopes that the authors will take into account the suggestions/questions in the reviews, specifically concerning the scope of the approach and its limitations, when writing the camera-ready version of the paper. Another question which comes to mind is whether the knowledge graph (e.g. as learned from a teacher network) can facilitate the training of a student network, e.g.

data-driven regularization, embedding layer, stochastic shared embedding, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsOct-9-2024, 20:10:54 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information.

data-driven regularization, embedding layer, stochastic shared embedding, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Learned Regularization for Inverse Problems: Insights from a Spectral Model

Burger, Martin, Kabri, Samira

arXiv.org Artificial IntelligenceDec-15-2023

The aim of this paper is to provide a theoretically founded investigation of state-of-the-art learning approaches for inverse problems. We give an extended definition of regularization methods and their convergence in terms of the underlying data distributions, which paves the way for future theoretical studies. Based on a simple spectral learning model previously introduced for supervised learning, we investigate some key properties of different learning paradigms for inverse problems, which can be formulated independently of specific architectures. In particular we investigate the regularization properties, bias, and critical dependence on training data distributions. Moreover, our framework allows to highlight and compare the specific behavior of the different paradigms in the infinite-dimensional limit.

noise, operator, regularization, (14 more...)

arXiv.org Artificial Intelligence

2312.09845

Country:

Europe > Netherlands > South Holland > Dordrecht (0.04)
Europe > Germany > Hamburg (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Explainable AI via Learning to Optimize

Heaton, Howard, Fung, Samy Wu

arXiv.org Artificial IntelligenceJun-11-2023

Indecipherable black boxes are common in machine learning (ML), but applications increasingly require explainable artificial intelligence (XAI). The core of XAI is to establish transparent and interpretable data-driven algorithms. This work provides concrete tools for XAI in situations where prior knowledge must be encoded and untrustworthy inferences flagged. We use the "learn to optimize" (L2O) methodology wherein each inference solves a data-driven optimization problem. Our L2O models are straightforward to implement, directly encode prior knowledge, and yield theoretical guarantees (e.g. satisfaction of constraints). We also propose use of interpretable certificates to verify whether model inferences are trustworthy. Numerical examples are provided in the applications of dictionary-based signal recovery, CT imaging, and arbitrage trading of cryptoassets. Code and additional documentation can be found at https://xai-l2o.research.typal.academy.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2204.14174

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Colorado (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.84)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Wu, Liwei, Li, Shuqing, Hsieh, Cho-Jui, Sharpnack, James L.

Neural Information Processing SystemsMar-18-2020, 20:18:50 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information.

data-driven regularization, embedding layer, stochastic shared embedding, (2 more...)

Neural Information Processing Systems

Technology: